Growth of E. coli and P. putida in monoculture and cocultures

qPCR_data contains the result of species-specific quantitative PCR. It has five columns:

  • sample: Sample name
  • condition: monoculture and coculture names. The two-species coculture system has five different initial abundance, two of which were monocultures, and the other three had the initial ratio (EC/PP) of 1:1000, 1:1, 1000:1, respectively. For convenience, we named the P. putida monoculture, 1:1000, 1:1, 1000:1 cocultures and E. coli monoculture as related to the proportion of E. coli to five groups of “none,” “less,” “equal,” “more” and “all,” respectively.
  • time: sample time.
  • species: species name. In cocultures, the quantity was related to two species, respectively.
  • quantity: the quantity of species, as calculated from the qPCR CT value and standard curves.

For example:

qPCR_data <- read.csv("data/qPCR-data.csv")
qPCR_data$condition <- factor(qPCR_data$condition, 
                              levels = c("none","less","equal","more","all"))

head(qPCR_data)
##     sample condition time species  quantity
## 1 Sample 1       all    0      EC 150753360
## 2 Sample 1       all    0      EC 179451379
## 3 Sample 1       all    0      EC 197309047
## 4 Sample 1       all    0      EC 325275685
## 5 Sample 1       all    0      EC 389286192
## 6 Sample 1       all    0      EC 299844026
organism <- c("EC","PP")
organsim_fullname <- c("EC"="E. coli","PP"="P. putida")

Figure 1 shows Growth curves of E. coli and P. putida in monoculture (A) and the “1:1000,” “1:1,” “1000:1” cocultures (B-D). The quantities were determined using species-specific qPCR as described in methods. In B-D subplots, the growth curves of monocultures were placed on the background layer (dashed lines), showing the difference between monoculture and cocultures.

# stats
growth_data <- qPCR_data %>% group_by(species,time,condition)  %>%
  summarise(y=median(log10(quantity)),std=sd(log10(quantity),na.rm = T))

# separate monoculture and coculture
monoculture <-  growth_data %>%
  filter(condition %in% c("none","all"))
coculture <- growth_data %>%
  filter(condition %in% c("less","equal","more"))

# plot
growth_curve_mono <- ggplot(monoculture,aes(time,y,color=species)) +
  geom_line(lty="dashed",size=1) +
    geom_errorbar(aes(ymin=y-std,ymax=y+std),size=.5) + 
    geom_point(data = monoculture) 
growth_curves_coculture <- lapply(c("less","equal","more"), function(x){
  ggplot(mapping = aes(time,y,color=species)) + 
    geom_line(lty="dashed",size=1,alpha=1/3,data = monoculture) +
    geom_line(size=1,data=filter(coculture,condition==x)) +
    geom_errorbar(aes(ymin=y-std,ymax=y+std),size=.5,data = monoculture,alpha=1/3) + 
    geom_errorbar(aes(ymin=y-std,ymax=y+std),size=.5,data = filter(coculture,condition==x)) + 
    geom_point(data = monoculture,alpha=1/3) + 
    geom_point(data=filter(coculture,condition==x))
    
})

# plot enhancement
growth_curves <- c(list(growth_curve_mono), growth_curves_coculture)
growth_curves <- lapply(growth_curves, function(p){
  p + ylab("Log(quantity)") + xlab("Time (h)") +
        scale_y_continuous(limits = c(5,10.8),breaks = 5:10) +
        scale_color_discrete(labels=organsim_fullname,name="Species") +
        theme_bw() +
        theme(legend.position = c(0.7,0.2),
              legend.text = element_text(face = "italic"))
})

# show plot
plot_grid(plotlist = growth_curves,ncol=2,labels = "AUTO")
Growth curves of *E. coli* and *P. putida* in monoculture (A) and the 1:1000, 1:1, 1000:1 cocultures (B-D)

Figure 1: Growth curves of E. coli and P. putida in monoculture (A) and the 1:1000, 1:1, 1000:1 cocultures (B-D)

if (save_tiff) ggsave("figure 1.tiff",path="figures")
if(save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Comparision of coculture and monoculture quantities

Figure 2 shows the variance analysis of the species abundances between monoculture and three cocultures. Results for E. coli (A-G) and P. putida (H-N) were given by time, as been illustrated on the top subtitle. In x-axis, mono represents the quantity of monoculture, and “1:1000,” “1:1,” “1000:1” represent the quantity of three cocultures. The significance of p-values were showed by “*” (p<0.05), “**” (p<0.01) or “ns” (p>0.05).

ref_group <- c("all","none")
library(ggtext)
plots <- lapply(seq_along(ref_group), function(i){
  ref <- ref_group[[i]]
  lapply(c(0,0.5,1,2,4,8,24), function(x){
   df <- qPCR_data %>% 
     filter(species == organism[[i]],time==x) %>% 
     dplyr::select(condition,quantity)
  df$condition <- relevel(df$condition, ref)
  ggplot(df,aes(condition,log10(quantity))) + 
    geom_boxplot() + 
    geom_jitter() + 
    scale_x_discrete(labels = c("mono","1:1000","1:1","1000:1")) +
    scale_y_continuous(expand = c(0,1)) +
    labs(subtitle = paste0("*",organsim_fullname[[organism[[i]]]],"* - ", x, "h")) +
    stat_compare_means(ref.group = ref,label = "p.signif") +
    theme(plot.subtitle = element_markdown()) 
  })
  
})
plots <- unlist(plots,recursive = F)
plot_grid(plotlist = plots,ncol=5,labels = "AUTO")
Analysis the variance of the species abundances between monoculture and three cocultures

Figure 2: Analysis the variance of the species abundances between monoculture and three cocultures

if(save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)
if (save_tiff) ggsave("figure S1.tiff",path="figures")

Ratio (EC/PP) changes in cocultures

Ratios of EC/PP in cocultures were calculated, and ratio changes in cocultures were compared in Figure 3.

# calculate ratio in cocultures
ratio <- qPCR_data %>% 
  filter(condition %in% c("less","equal","more")) %>% 
  group_by(sample, condition, species, time) %>% 
  mutate(rep = row_number()) %>% 
  ungroup() %>% 
  pivot_wider(names_from = species, values_from = quantity) %>%
  mutate(ratio = EC/PP) %>%
  filter(!is.na(ratio))

Figure 3 shows the deterministic assembly of E. coli and P. putida cocultures. (A) the real-time ratio of EC: PP after 0, 0.5, 1, 2, 4, 8 and 24 h cultivation. (B) Analysis of the variances of EC/PP ratios (24h) between cocultures.

ratio.sum <- ratio %>% 
  group_by(sample) %>%
  mutate(y=mean(ratio,na.rm=TRUE),std=sd(ratio,na.rm=TRUE)) %>% 
  dplyr::select(sample,condition, time, y, std) %>%
  unique()
ratio.sum$condition <- factor(ratio.sum$condition,
                        levels = c("less","equal","more"),
                        labels = c("1:1000","1:1","1000:1")) 
plot_ratio <- ggplot(ratio.sum, aes(time,y,shape=condition,color=condition)) + 
  geom_rect(aes(xmin=23,xmax=25,ymin=0.02,ymax=0.3),
            fill="lightyellow",color="black",alpha=0.1) +
  geom_line(size=1,show.legend = F) +
  geom_point(size=2,show.legend = F) +
  geom_errorbar(aes(ymin=y-std,ymax=y+std),show.legend = F) +
  geom_text(aes(x=9,label=condition),hjust=0,vjust=c(0,0,1),
            data = filter(ratio.sum,time==8),
            show.legend = F) +

  # directlabels::geom_dl(aes(label=condition),method="smart.grid") +
  scale_y_log10(labels=formatC,breaks=10^(-3:3)) +
  labs(x="Time (h)", y="Ratio (EC/PP)") +
  # scale_x_continuous(limits = c(-5,NA)) +
  theme(legend.position = c(0.8,0.75))

ratio_24h <- ratio %>% filter(time==24)
plot_ratio_stats <- ggplot(ratio_24h,aes(condition,ratio,color=condition)) +
  geom_boxplot(fill="lightyellow") + 
  geom_jitter() + 
  stat_compare_means(
    comparisons = list(c("less","equal"),c("less","more"),c("equal","more")),
    label="p.format") +
   scale_x_discrete(breaks=c("less","equal","more"),
                    labels=c("1:1000","1:1","1000:1"))+
  xlab("Condition") + ylab("Ratio (EC/PP)") +
  theme(legend.position = "none",
        panel.background = element_rect(fill="lightyellow"
      ))

plot_grid(plot_ratio,plot_ratio_stats,labels = "AUTO",rel_widths = c(3,2))
Deterministic assembly of E. coli and P. putida cocultures. (A) the real-time ratio of EC: PP after 0, 0.5, 1, 2, 4, 8 and 24 h cultivation. (B) Analysis of the variances of EC/PP ratios (24h) between cocultures.

Figure 3: Deterministic assembly of E. coli and P. putida cocultures. (A) the real-time ratio of EC: PP after 0, 0.5, 1, 2, 4, 8 and 24 h cultivation. (B) Analysis of the variances of EC/PP ratios (24h) between cocultures.

if (save_tiff) ggsave("figure 2.tiff",path="figures")
if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Notably, the ratio differences were non-significant by time in the logarithmic phase, but were all significant in the stationary phase for every coculture (Figure 4).

For each coculture, significance of variances between each sample were as follows:

  • the 1:1000 coculture
### stats
ratio1 <- ratio %>% filter(condition=="less")
(p1 <- pairwise.wilcox.test(ratio1$ratio,ratio1$time,p.adjust.method = "BH"))
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  ratio1$ratio and ratio1$time 
## 
##     0      0.5    1      2      4      8     
## 0.5 0.5359 -      -      -      -      -     
## 1   0.9372 0.4596 -      -      -      -     
## 2   0.0785 0.4062 0.1049 -      -      -     
## 4   0.1848 0.4596 0.1848 0.7341 -      -     
## 8   0.0045 0.0045 0.0045 0.0045 0.0045 -     
## 24  0.0045 0.0045 0.0045 0.0045 0.0045 0.1049
## 
## P value adjustment method: BH
  • the 1:1 coculture
ratio2 <- ratio %>% filter(condition=="equal")
(p2 <- pairwise.wilcox.test(ratio2$ratio,ratio2$time,p.adjust.method = "BH"))
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  ratio2$ratio and ratio2$time 
## 
##     0      0.5    1      2      4      8     
## 0.5 0.7727 -      -      -      -      -     
## 1   0.2902 0.2197 -      -      -      -     
## 2   0.7273 0.7922 0.3364 -      -      -     
## 4   0.6364 0.7922 0.3364 0.7727 -      -     
## 8   0.0051 0.0083 0.0051 0.0051 0.0051 -     
## 24  0.0051 0.0083 0.0051 0.0051 0.0051 0.0051
## 
## P value adjustment method: BH
  • and the 1000:1 coculture
ratio3 <- ratio %>% filter(condition=="more")
(p3 <- pairwise.wilcox.test(ratio3$ratio,ratio3$time,p.adjust.method = "BH"))
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  ratio3$ratio and ratio3$time 
## 
##     0      0.5    1      2      4      8     
## 0.5 0.2968 -      -      -      -      -     
## 1   0.6991 0.2358 -      -      -      -     
## 2   0.3636 0.1152 0.4500 -      -      -     
## 4   0.0070 0.0070 0.0130 0.3636 -      -     
## 8   0.0051 0.0051 0.0051 0.0070 0.0051 -     
## 24  0.0051 0.0051 0.0051 0.0070 0.0051 0.0051
## 
## P value adjustment method: BH

Figure 4 shows P-values of pairwise comparison of real-time EC/PP ratios in the “1:1000” (A), “1:1” (B) and “1000:1” (C) cocultures. Circles showed the adjusted p-value for the comparison of the EC/PP ratios between two samples, which were indicated on the top and left (large circle mean large p-value). On this plot, a cross mark was given if the p-value is non-significant (P > 0.05).

par(mfrow=c(1,3))
pvalues <- list(p1,p2,p3)
line = 0
cex = 1.2
side = 3
adj=0.15
plots <- lapply(seq_along(pvalues), function(i){
  x <- pvalues[[i]]
  corrplot(x$p.value, 
           type = "lower",
           col = "grey",
           cl.pos = "n",
           is.corr = FALSE, 
           method = "circle",
           p.mat = x$p.value,
           sig.level = 0.05)
  mtext(LETTERS[[i]], side = side, line = line, cex = cex, adj = adj)

})
P-values of pairwise comparison of real-time EC/PP ratios in the “1:1000” (A), “1:1” (B) and “1000:1” (C) cocultures.

Figure 4: P-values of pairwise comparison of real-time EC/PP ratios in the “1:1000” (A), “1:1” (B) and “1000:1” (C) cocultures.

if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)
if (save_tiff) ggsave("figure S2.tiff",path="figures")

Gene expression analysis

To reveal the mechanism of community assembly in cocultures, we investigated the transcriptomic changes in cocultures using RNA-seq analysis.

Totally 60 samples were sequences. After sequencing quality control, each sample has 2.6 – 3.9 M paired reads, and 3.9 – 5.9 G base pairs, having an overall coverage of 300 X at least. After filtration, high-quality reads were aligned against the P. putida (https://www.ncbi.nlm.nih.gov/genome/?term=pseudomonas+putida+kt2440) and E.coli reference genome (https://www.ncbi.nlm.nih.gov/genome/?term=Escherichia+coli+K-12) using hisat2 and gene expression changes were quantified using DESeq2 software (Herberg et al. 2016; Love, Huber, and Anders 2014). While aligning to reference genomes, the overall aligned rates are ranging from 97.23% to 98.43%.

tableS1 <- read.xlsx("./data/tableS1.xlsx")
summary(tableS1)
##   Condition              Time       Sample          Raw_Read_Number   
##  Length:60          Min.   : 0   Length:60          Min.   :26331150  
##  Class :character   1st Qu.: 3   Class :character   1st Qu.:30196483  
##  Mode  :character   Median : 6   Mode  :character   Median :32429121  
##                     Mean   : 9                      Mean   :32696008  
##                     3rd Qu.:12                      3rd Qu.:34950975  
##                     Max.   :24                      Max.   :39576900  
##    Raw_Bases           Raw_N_rate       Raw_GC_content   Raw_Q20_rate  
##  Min.   :3.976e+09   Min.   :0.000965   Min.   :51.10   Min.   :95.75  
##  1st Qu.:4.560e+09   1st Qu.:0.002455   1st Qu.:51.74   1st Qu.:96.39  
##  Median :4.897e+09   Median :0.004755   Median :55.62   Median :96.56  
##  Mean   :4.937e+09   Mean   :0.005884   Mean   :55.54   Mean   :96.63  
##  3rd Qu.:5.278e+09   3rd Qu.:0.005086   3rd Qu.:59.09   3rd Qu.:96.95  
##  Max.   :5.976e+09   Max.   :0.016776   Max.   :60.07   Max.   :97.30  
##   Raw_Q30_rate   Trimmed_Read_Number Trimmed_Bases        Useful_read%  
##  Min.   :89.66   Min.   :26217232    Min.   :3.943e+09   Min.   :99.17  
##  1st Qu.:90.98   1st Qu.:29991340    1st Qu.:4.511e+09   1st Qu.:99.38  
##  Median :91.43   Median :32249039    Median :4.859e+09   Median :99.46  
##  Mean   :91.59   Mean   :32514927    Mean   :4.896e+09   Mean   :99.44  
##  3rd Qu.:92.41   3rd Qu.:34741486    3rd Qu.:5.234e+09   3rd Qu.:99.50  
##  Max.   :93.06   Max.   :39286830    Max.   :5.921e+09   Max.   :99.64  
##  Useful_bases%    align_rate%   
##  Min.   :98.40   Min.   :97.23  
##  1st Qu.:99.12   1st Qu.:97.98  
##  Median :99.21   Median :98.10  
##  Mean   :99.15   Mean   :98.03  
##  3rd Qu.:99.27   3rd Qu.:98.22  
##  Max.   :99.34   Max.   :98.43

Figure 5 shows the reads count of E. coli and P. putida in each RNA-seq library. Samples were taken with each condition at indicated times (at 0, 4, 8 and 24h). (A-D) P. putida monoculture samples; (E-H) E. coli and P. putida “1:1000” coculture; (I-L) the “1:1” coculture; (M-P) the “1000:1” coculture; and (Q-T) the E. coli monoculture samples. Each sample has three replicates. Only aligned reads to either E. coli or P. putida genome were used in this calculation. Plots showed the proportion of reads which have aligned to corresponding genome by different colors, as indicated on the top.

ht_counts <- readRDS(file = "./data/ht_counts.rds")
ht_counts$group <- factor(ht_counts$group,
                          levels = c("none_0h","none_4h","none_8h","none_24h","less_0h","less_4h","less_8h","less_24h","equal_0h","equal_4h","equal_8h","equal_24h","more_0h","more_4h","more_8h","more_24h","all_0h","all_4h","all_8h","all_24h"),
                          labels = c("P. putida_0h","P. putida_4h","P. putida_8h","P. putida_24h","1:1000_0h","1:1000_4h","1:1000_8h","1:1000_24h","1:1_0h","1:1_4h","1:1_8h","1:1_24h","1000:1_0h","1000:1_4h","1000:1_8h","1000:1_24h","E. coli_0h","E. coli_4h","E. coli_8h","E. coli_24h"))
ht_counts_total <- ht_counts %>% group_by(sample_id, group, organism) %>%
  summarise(sum_of_reads=sum(count)) %>%
  group_by(sample_id) %>% 
  mutate(proportion=sum_of_reads/sum(sum_of_reads))
samples <- levels(ht_counts_total$group)
plots <- lapply(1:length(samples),function(i){
  sample_group <- samples[[i]]
  df <- filter(ht_counts_total,group==sample_group)
  ggplot(df,aes(x=sample_id, y=proportion,fill=organism)) + 
    geom_bar(stat = "identity",position = "stack") + 
    scale_y_continuous(labels = function(l)paste(format(l*100,digits = 2),"%",sep="")) +
    scale_x_discrete(labels=c("Rep1","Rep2","Rep3")) +
    scale_fill_discrete(name="Organism: ",labels=c("EC"="E. coli","PP"="P. putida")) +
    labs(title = sample_group) +
    theme(legend.text = element_text(face = "italic"),
          legend.position = "none",
          axis.title = element_blank())
})
legend <- get_legend(plots[[1]] + theme(legend.position = "top",legend.direction = "horizontal"))
plot_grid(legend, plot_grid(plotlist = plots,labels = "AUTO",ncol=4),rel_heights = c(1,15),ncol=1)
Reads count of *E. coli* and *P. putida* in each RNA-seq library

Figure 5: Reads count of E. coli and P. putida in each RNA-seq library

if (save_tiff) ggsave("figure S3.tiff",path="figures")
if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Identify gene expression changes

This step is a time consuming step. Use the precalculated DEG if possible.

# load precalculated DEG results.
dds.EC <- readRDS("data/dds.EC.2.rds")
dds.PP <- readRDS("data/dds.PP.2.rds")
DEG_results.EC <- readRDS("data/DEG_results.EC.rds")
DEG_results.PP <- readRDS("data/DEG_results.PP.rds")

RNA-seq clustering

First of all, we compared the gene expression profiles between monoculture and cocultures at genome level. Figure 6 shows the Principle coordinate analysis (PCA) of the gene expression profiles of E. coli (A) and P. putida (B) in different samples.

list_of_vsd <- lapply(list(dds.EC,dds.PP),function(dds){
  vst(dds,blind = F)
})
list_of_vsd[[1]]$ratio0 <- factor(list_of_vsd[[1]]$ratio0,
                                  levels = c("none","less","equal","more","all"),
                        labels = c("P. putida","1:1000","1:1","1000:1","E. coli"))
list_of_vsd[[2]]$ratio0 <- factor(list_of_vsd[[2]]$ratio0,
                                  levels = c("none","less","equal","more","all"),
                        labels = c("P. putida","1:1000","1:1","1000:1","E. coli"))

The results showed that the gene expression profiles were different between monoculture and cocultures in the beginning, and became more consistent in the later (Fig. 6A). Differences in gene expression appeared immediately after mixing of E. coli and P. putida, which has been described by 0h data (due to the time required for manual operation, this is of course not the real 0h). The expression profiles (0h) of E. coli monoculture, “1:1” and “1000:1” cocultures are close, but the “1:1000” coculture is distinct from the others. At 4h and 8h, we can still visually separate the “less” samples and the others on the plots. However, the expression profiles of E. coli became closer by time. After 24h cultivation, there was no obvious boundary between E. coli monoculture and three cocultures. Likewise, the expression profiles of P. putida has a similar pattern, except that the distance between “1000:1” coculture and other three samples was more obvious (Fig. 6B).

list_of_PCA_plot <- lapply(list_of_vsd, function(vsd) {
  myPlotPCA(vsd) + 
    facet_wrap(~time,ncol=4) +
    directlabels::geom_dl(aes(label=ratio0),method = "smart.grid",size=2) +  #文本代替标签 位置标注的不好,改size没用
    scale_color_manual(limits=c("E. coli","1:1000","1:1","1000:1","P. putida"),
                       values = brewer.pal(5,"Dark2")) +
    theme(legend.position = "none")
  })

plot_grid(plotlist = list_of_PCA_plot,labels = "AUTO",ncol=1)
Principle coordinate analysis (PCA) of the gene expression profiles of *E. coli* (A) and *P. putida* (B) in different samples.

Figure 6: Principle coordinate analysis (PCA) of the gene expression profiles of E. coli (A) and P. putida (B) in different samples.

if (save_tiff) ggsave("figure 3.tiff",path="figures")
if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Count of DEGs

Figure 7 shows the Counts of differentially expressed genes (DEGs) in three cocultures, in E. coli (A) and P. putida (B). The DEGs were identified by comparison with the corresponding monoculture at the same time. Up- and down-regulation of genes were colored by red and cyan, respectively.

## count DEG
deg_count <- function(data){
  do.call("rbind",lapply(data, function(x) table(x$expression))) %>%
    as.data.frame() %>%
    rownames_to_column(var = "name") %>%
    separate(name, into = c("ratio","time"), sep = "_", extra = "drop") %>%
    mutate(time = as.numeric(str_extract(time, "[0-9]+"))) %>%
    pivot_longer(cols = c("dn","up"), names_to = "type", values_to = "count") %>%
    mutate(count = ifelse(type == "dn", -count, count)) %>%
    complete(ratio, time, type, fill = list(count = 0)) %>%
    mutate(ratio = factor(ratio, 
                          levels = c("less","equal","more"),
                          labels = c("1:1000","1:1","1000:1")),
           type = factor(type,
                         levels = c("up","dn"),
                         labels = c("Up","Down")))
}


deg_count_EC <- deg_count(DEG_results.EC)
deg_count_PP <- deg_count(DEG_results.PP)
count <- list(deg_count_EC, deg_count_PP)
plots <- lapply(seq_along(count), function(i){
  x <- count[[i]]
  ggplot(x, aes(x = time, y = count, color = type)) +
    geom_point() +
    geom_line(size = 1) +
    scale_y_continuous(labels = function(x){abs(x)}) +
    facet_wrap(~ratio) +
    labs(x="Time(h)",y="Count of DEGs",
         color = "Gene expression:",
         title= paste0("DEGs in *",organsim_fullname[[i]],"*")) +
    theme(legend.position = c(0.618,1),
        legend.justification = c(0.5,-0.65),
        legend.direction = "horizontal",
        plot.title = element_markdown())
}) 


plot_grid(plotlist = plots,labels = "AUTO",ncol = 1)
Counts of differentially expressed genes (DEGs) in three cocultures, in *E. coli* (A) and *P. putida* (B).

Figure 7: Counts of differentially expressed genes (DEGs) in three cocultures, in E. coli (A) and P. putida (B).

if (save_tiff) ggsave("figure 4.tiff",path="figures")
if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Specific DEGs in E. coli and P. putida

Figure 8 shows the comparision of differentially expressed genes by time in E. coli (A-C) and P. putida (D-F). Although DEGs overlapped for different time, specific genes are the majority in almost every time point. For instance, 57 out of 93 DEGs are specific in 1:1000 coculture in E. coli (Fig. 8A).

library(ggVennDiagram)

ratio0 <- c("1:1000","1:1","1000:1")
deg_Venn_plot_EC <- lapply(seq_along(ratio0), function(i){
  gene_list <- lapply(DEG_results.EC[(i*4-3):(i*4)], function(x){x$gene})
  ggVennDiagram(gene_list,label = "count",
                category.names = c("0h","4h","8h","24h")) +
    scale_fill_gradient(low="white",high="red",limits=c(0,310)) +
    coord_equal() +
    labs(title=paste0(ratio0[[i]]," - *E. coli*")) +
    theme(legend.position = "none",
          plot.title = element_markdown(hjust=0.5))
})

deg_Venn_plot_PP <- lapply(seq_along(ratio0), function(i){
  gene_list <- lapply(DEG_results.PP[(i*4-3):(i*4)], function(x){x$gene})
  ggVennDiagram(gene_list,label = "count",
                category.names = c("0h","4h","8h","24h")) +
    scale_fill_gradient(low="white",high="red",limits=c(0,310)) +
    coord_equal() +
    labs(title=paste0(ratio0[[i]]," - *P. putida*")) +
    theme(legend.position = "none",
          plot.title = element_markdown(hjust=0.5))
})

plot_grid(plotlist = c(deg_Venn_plot_EC,deg_Venn_plot_PP),
          labels = "AUTO")
Comparision of differentially expressed genes by time

Figure 8: Comparision of differentially expressed genes by time

if (save_tiff) ggsave("figure S4.tiff",path="figures")
if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)

Enrichment analysis of DEGs

We use clusterProfiler to perform KEGG enrichment analysis.

ck_plot <- function(ck, 
                    mapping = aes(time, Description, size = GeneRatio)){
  df <- data.frame(ck) %>%
    mutate(ratio = factor(ratio, levels = c("less","equal","more"),
                          labels = c("1:1000","1:1","1000:1")),
           time = factor(time, levels = c("0h","4h","8h","24h"))) 
  df$Description <- factor(df$Description, levels = kegg_path_tree(df,pathway_name = "Description", gene_id = "geneID"))
  df$GeneRatio <- sapply(df$GeneRatio, function(x) eval(parse(text = x)))
  ggplot(df, mapping = mapping) +
    geom_point() +
    facet_grid(~ ratio, scales = "free_y") +
    labs(y="KEGG pathway")
}


grid_panel_autoheight <- function(p){
  # 调整panel的高度
  require(gtable)
  gp <- ggplotGrob(p)
  # gtable_show_layout(gp)
  facet.rows <- gp$layout$t[grepl("panel",gp$layout$name)]
  y.var <- sapply(ggplot_build(p)$layout$panel_scales_y,
                  function(l) length(l$range$range))
  gp$heights[facet.rows] <- gp$heights[facet.rows] * y.var
  return(gp)
}

reset_facet_width <- function(p, rel_width = c(1)){
  gp <- ggplotGrob(p)
  facet.cols <- gp$layout$l[grepl("panel",gp$layout$name)]
  gp$widths[facet.cols] <- gp$widths[facet.cols] * rel_width
  return(gp)
}

kegg_path_tree <- function(df, pathway_name, gene_id, sep="/"){
  pathway <- df %>% 
    separate_rows(gene_id, sep = sep) %>%
    dplyr::select(c(pathway_name, gene_id)) %>%
    unique() %>%
    mutate(value = 1) %>%
    pivot_wider(id_cols = pathway_name,
                names_from = gene_id,
                values_from = value,
                values_fill = list(value = 0))
  library(vegan)
  matrix <- as.matrix(column_to_rownames(pathway, pathway_name))
  dist <- vegdist(matrix, method = "jaccard")
  
  library(ape)
  library(ggtree)
  tree <- bionj(dist)
  p <- ggtree(tree,branch.length = "none")
  p$data %>%
    filter(isTip) %>%
    arrange(y) %>%
    pull(label)
}
plot_grid(p1,p2, rel_heights = c(1,0.3), ncol = 1, labels = "AUTO",align = "v")
## Warning: Graphs cannot be vertically aligned unless the axis parameter is set.
## Placing graphs unaligned.
A dot plot shows the KEGG enrichment result of E. coli (A) and P. putida (B) in the three coculture as a function of time.

Figure 9: A dot plot shows the KEGG enrichment result of E. coli (A) and P. putida (B) in the three coculture as a function of time.

if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)
if (save_tiff) ggsave("figure S5.tiff",path="figures")

Get set enrichment analysis (GSEA) of gene expression profile

Enrichment analysis of DEGs is a common approach in analyzing gene expression profiles. This approach will find genes where the difference is large, but it will not detect a situation where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA) (Subramanian et al. 2005) directly addresses this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. Since it is likely that many relevant phenotypic differences are manifested by small but consistent changes in a set of genes.

# load pre-calculated result
gene_expression.EC <- readRDS("data/gene_expression.EC.rds")
gene_expression.PP <- readRDS("data/gene_expression.PP.rds")
# define a function to extract and prepare formatted data for GSEA
get_genelist <- function(x){
  if (nrow(x) < 1) return(NULL)
  geneList <- x$log2FoldChange
  names(geneList) <- x$gene
  geneList <- sort(geneList,decreasing = T) 
  return(geneList)
}

### *E. coli* GSEA KEGG result
gseKEGG_results.EC <- lapply(gene_expression.EC, function(x){
  geneList <- get_genelist(x)
  tryCatch(gseKEGG(geneList, organism = "eco",nPerm = 10000,minGSSize = 10),error=function(e) NULL)
})

### *P. putida* GSEA KEGG result
gseKEGG_results.PP <- lapply(gene_expression.PP, function(x){
  geneList <- get_genelist(x)
  tryCatch(gseKEGG(geneList, organism = "ppu",nPerm = 10000,minGSSize = 10),error=function(e) NULL)
})
# combine and reform GSEA result to a data frame
gse_result <- function(result){
  name <- names(result)
  l <- lapply(seq_along(result), function(i){
    data.frame(result[[i]]) %>%
      mutate(comparison = name[[i]])
  })
  do.call("rbind", l) %>%
    separate(comparison, into = c("ratio","time"), extra = "drop") %>%
    mutate(ratio = factor(ratio, levels = c("less","equal","more"),
                          labels = c("1:1000","1:1","1000:1")),
           time = factor(time, levels = c("0h","4h","8h","24h")),
           type = ifelse(enrichmentScore >0, "activated", "suppressed"),
           enrichScore = abs(enrichmentScore)) 
}

# dotplot GSEA result
gse_dotplot <- function(df){
  ggplot(df, aes(time, Description, size = enrichScore,color=type)) +
    geom_point() +
    facet_grid(~ ratio, scales = "free_y") +
    labs(y="KEGG pathway")
}
plot_grid(p1,p2,align = "v", ncol = 1,labels = "AUTO",rel_heights = c(1.5,1))
GSEA result of cocultures in E. coli (A) and P. putida (B)

Figure 10: GSEA result of cocultures in E. coli (A) and P. putida (B)

if (save_tiff) ggsave("figure 5.tiff",path="figures")

Figure 10 shows the GSEA analysis result for E. coli (A) and P. putida (B).

Overlap of GSEA pathway in E. coli and P. putida

# todo
merged_gsea <- rbind(mutate(df1, organism = "eco"), mutate(df2, organism = "ppu"))
merged_gsea <- merged_gsea %>% group_by(organism, type) %>% nest()
merged_gsea$pathway <- sapply(merged_gsea$data, function(x) pull(x, Description) %>% as.character())

Figure 11 shows the Overlap of GSEA pathway in E. coli and P. putida.

l <- merged_gsea$pathway
names(l) <- paste(merged_gsea$organism, merged_gsea$type,sep = "-")
p1 <- ggVennDiagram(l,label = "count")

l2 <- list(eco=c(l[[1]],l[[2]]), ppu=c(l[[3]],l[[4]]))
p2 <- ggVennDiagram(l2, label = "count")

l3 <- list(activated=c(l[[1]],l[[3]]), suppressed=c(l[[2]],l[[4]]))
p3 <- ggVennDiagram(l3, label = "count")

plot_grid(p1,plot_grid(p2,p3,ncol = 2,labels = c("B","C")),labels = c("A",""),ncol = 1,rel_heights = c(2,1))
Overlap of GSEA pathway in E. coli and P. putida. Pathways were distinguished by their descriptions, and separated to activated and suppressed ones.

Figure 11: Overlap of GSEA pathway in E. coli and P. putida. Pathways were distinguished by their descriptions, and separated to activated and suppressed ones.

if (save_figure_to_ppt) export::graph2ppt(file="figures.pptx",append=TRUE)
if (save_tiff) ggsave("figure 6.tiff",path="figures")
Herberg, Jethro A., Myrsini Kaforou, Victoria J. Wright, Hannah Shailes, Hariklia Eleftherohorinou, Clive J. Hoggart, Miriam Cebey-López, et al. 2016. “Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial Vs Viral Infection in Febrile Children.” JAMA 316 (8): 835. https://doi.org/10.1001/jama.2016.11236.
———, et al. 2016. “Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial Vs Viral Infection in Febrile Children.” JAMA 316 (8): 835. https://doi.org/10.1001/jama.2016.11236.
Love, Michael I, Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12). https://doi.org/10.1186/s13059-014-0550-8.
Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, et al. 2005. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proceedings of the National Academy of Sciences of the United States of America 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102.